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Abstract 


With the recent rise in popularity and scale 
of social media, a growing need exists for 
systems that can extract useful information 
from huge amounts of data. We address the 
issue of detecting influenza epidemics. 
First, the proposed system extracts influen- 
za related tweets using Twitter API. Then, 
only tweets that mention actual influenza 
patients are extracted by the support vector 
machine (SVM) based classifier. The ex- 
periment results demonstrate the feasibility 
of the proposed approach (0.89 correlation 
to the gold standard). Especially at the out- 
break and early spread (early epidemic 
stage), the proposed method shows high 
correlation (0.97 correlation), which out- 
performs the state-of-the-art methods. This 
paper describes that Twitter texts reflect 
the real world, and that NLP techniques 
can be applied to extract only tweets that 
contain useful information. 


1 = Introduction 


Twitter', a popular micro-blogging service, has 
received much attention recently. It is an online 
network used by millions of people around the 
world to stay connected to their friends, family 
members, and co-workers through their computers 
and mobile telephones (Milstein et al., 2010). 
Nowadays, Twitter users have increased rapidly. 
Its community estimated as 120 million worldwide, 
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posts more than 5.5 million messages (tweets) eve- 
ry day (reported by Twitter.com in March 2011). 
Twitter can potentially serve as a valuable infor- 
mation resource for various applications. Huber- 
man et al. (2009) analyzed the relations among 
friends. Boyd et al. (2010) investigated commuta- 
tion activity. Sakaki et al. (2010) addressed the 
detection of earthquakes. Among the numerous 
potential applications, this study addresses the is- 
sue of detecting influenza epidemics, which pre- 
sents two outstanding advantages over current 
methods. 


@ Large Scale: More than a thousand messages 
include the word “influenza” each day (Nov. 
2008 — Oct. 2009). Such a huge data volume 
dwarfs traditional surveillance resources. 


© Real-time: Twitter enables real-time and di- 
rect surveillance. This characteristic is ex- 
tremely suitable for influenza epidemic 
detection because early stage detection is im- 
portant for influenza warnings. 


Although Twitter based influenza warnings poten- 
tially offer the advantages noted above, it might 
also expose inaccurate or biased information from 
tweets like the following (brackets [] indicate the 
comments): 


@ Headache? You might have flu. [Suspi- 
cions] 

@ The World Health Organization reports 
the avian influenza, or bird flu, epidemic 
has spread to nine Asian countries in the 
past few weeks. [General News] 
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© Are you coming down with influenza? 
[Question] 


Although these tweets include mention of “influ- 
enza” or “flu”, they do not indicate that an influen- 
za patient is present nearby. We regard such 
messages (merely suspicions/questions, general 
news, etc.) as negative influenza tweets. We call 
others positive influenza tweets. In our experi- 
ments, 42% of all tweets that include “influenza” 
are negative influenza tweets. The huge volume of 
such negative tweets biases the results. 

This paper presents a proposal of a machine- 
learning based classifier to filter out negative in- 
fluenza tweets. First, we build an annotated corpus 
of pairs of a tweet and positive/negative labels. 
Then, a support vector machine (SVM) (Cortes and 
Vapnik, 1995) based sentence classifier extracts 
only positive influenza tweets from tweets. In the 
experiments, the results demonstrated the high cor- 
relation (0.89 of the correlation), which is equal 
performance to that of the state-of-the-art method. 


The specified research point of this study is two- 

fold: 

(1) This report describes that an SVM-based clas- 
sifier can filter out the negative influenza 
tweets (f-measure=0.76). 

(2) Experiments empirically demonstrate that the 
proposed method detects the influenza epidem- 
ics with high accuracy (correlation ratio=0.89): 
it outperforms the state-of-the-art method. 


2 Influenza Epidemic Detection 


The detection of influenza epidemics is a national 

mission in every country for two reasons. 

(1) Anti-influenza drugs, which differ among in- 
fluenza types, must be prepared before the epi- 
demics. 

(2) We can only slightly predict what type of in- 
fluenza will spread in any given season. 


This situation naturally demands the early detec- 
tion of influenza epidemics. This section presents a 
description of previous methods of influenza epi- 
demic detection. 


2.1 


Most countries have their own influenza surveil- 
lance organization/center: the U.S. has the Centers 


Traditional Approaches 
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for Disease Control and Prevention (CDC)’, the 
E.U. has its European Influenza Surveillance 
Scheme (EISS), and Japan has its Infection Disease 
Surveillance Center (IDSC). Their surveillance 
systems fundamentally rely on both virology and 
clinical data. For example, the IDSC gathers influ- 
enza patient data from 5,000 clinics and releases 
summary reports. Such manual systems typically 
have a 1-2 week reporting lag. This time lag is 
sometimes pointed out as a major flaw. 


2.2 Recent Approaches 


In an attempt to provide earlier influenza detection, 
various new approaches are proposed each year. 

Espino et al. (2003) described a telephone triage 
service, a public service, to give advice to users via 
telephone. They investigated the number of tele- 
phone calls and reported a significant correlation 
with influenza epidemics. 

Magruder (2003) used the amount of over-the- 
counter drug sales. Because an influenza patient 
usually requires anti-influenza drugs, this approach 
is reasonable. However, in most countries, anti- 
influenza drugs are not available at the drug store 
(only hospitals provide such drugs). 

The state-of-the-art approach is that proposed by 
Ginsberg et al. (2009). They used Google web 
search queries that correlate with an influenza epi- 
demic. Their approach demonstrated high accuracy 
(average correlation ratio of 0.97; min=0.92; 
max=0.99)*. Several research groups have used 
similar approaches. Polgreen et al. (2008) used a 
Yahoo! query log. Hulth et al. (2009) used a query 
log of a Switzerland web search engine. 

Although the above approaches use different in- 
formation, they share the same approach, which is 
to observe patient actions directly. This approach 
was sufficient to obtain more numerous data than 
traditional services. Nevertheless, such information 
is unfortunately limited only to the service pro- 
vider. For example, web search queries are avail- 
able only for several companies: Google, Yahoo!, 
and Microsoft. 

This paper examines Twitter data, which are 
widely available. Note that Paul and Dredze (2011) 
also propose a similar Twitter based approach. 
While they focus on a word distribution, this paper 
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employs a sentence classification (discrimination 
of negative influenza tweets). 


3 Influenza Corpus 


As described in Section 1, it is necessary to filter 
out negative influenza tweets to infer precise 
amounts of influenza epidemics. To do so, we con- 
structed the influenza corpus (Section 3). Then, we 
trained the SVM-based classifier using the corpus 
(Section 4). 

The corpus comprises pairs of sentences and a 
label (positive or negative). Several examples are 
presented in Table 1. This corpus was built using 
the following procedure. 


3.1 Influenza Tweet 


First, we collected 300 million tweets, starting 
from 2008 November to 2010 June, via Twitter 
API. Crawling results are presented in Figure 1. 
We extracted only influenza-related tweets using a 
simple word look-up of “influenza”. This operation 
gave us 0.4 million tweets. We separated the data 
into two data groups. 

Training Data are 5,000 tweets sent in Novem- 
ber 2008. These were annotated by human annota- 
tors, and were then used for training. 

Test Data are the other data. They were used in 
experiments of influenza epidemics detection. Be- 
cause of the three dropout periods (Figure 1), the 
test data were separated into four periods (winter 
2008, summer 2009, winter 2009, and summer 
2010). 


3.2 Positive-negative Annotation 


To each tweet in the training dataset, a human an- 
notator assigned one of two labels: positive or neg- 
ative. In this labeling procedure, we regarded a 
tweet that meets the following two conditions as 
positive data. 


Condition 1 (A Tweet person or Surrounding 
persons have Flu): one or more people who have 
influenza should exist around the tweet person. 
Here, we regard “around” as a distance in the same 
city. In cases in which the distance is unknown, we 
regard it as negative. Because of this annotation 
policy, the re-tweet type message is negative. 
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Figure 1: Twitter Data used in this Study. 
The data include three dropout periods because the Twitter API 
specifications changed in those periods. The dropout periods 
were removed from evaluation in the experiments (Section 5). 


Table 1: Corpus (Tweets with a Positive or Nega- 









































tive Label) 
Positive(+1)/ Tweet 
Negative(-1) 
+1 A bad influenza is going around in our lab. 
+1 | caught the flu. | was burning up. 
+1 | think I'm coming down with the flu. 
+1 It's the flu season. | had it and now he do es. 
+1 Don't give me the flu. 
(Nearby people have the flu) 
+1 My flu is worse than it was yesterday. 
-1 In the normal flu season, 80 percent of deaths 
occur in people over 65 
(Simply a fact) 
-1 Influenza is now raging throughout Japan. 
(Too general.) 
-1 His wife also contracted the bird flu, but has 
recovered. 
(Where is his wife?) 
-1 You might have the flu. Has anyone around 
you had it? 
(Where are you?) 
-1 Bird flu damage is spreading in Japan. 


(Too general.) 





“+1” indicates a positive influenza tweet. “-1” indicates a 
negative influenza tweet. The case arc “()” indicates the rea- 
son for the positive or negative annotation. 
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Influenza Related Term 


Figure 2: Feature Representation. 
The word boundary is detected by a morph analyzer JUMAN*. 
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Condition 2 (Tense/Modality): The tense should 
be the present tense (current) or recent past. Here, 
we define the “recent past” as the prior 24 hour 
period (such as “yesterday”). The sentence should 
be affirmative (not interrogative and not subjunc- 
tive). 


4 Influenza Positive—negative Classifier 


Using the corpus (Section 3), we built a classifier 
that judges whether a given tweet is positive or 
negative. This task setting is similar to a sentence 
classification (such as spam e-mail filtering, senti- 
ment analysis, and so on). We used a popular 
means for sentence classification, which is based 
on a machine learning classifier under the bag-of- 
words (BOW) representation (Figure 2). The 
parameters were investigated in preliminary ex- 
periments in terms of feature window size (Section 
4.1) and machine-learning methods (Section 4.2). 
These preliminary experiments were conducted 
under the ten-fold cross variation manner using the 
training set. 


4.1 


Performance was dependent on the window size 
(the number of left/right side words). Figure 3 de- 
picts the performance obtained using various win- 
dow sizes. The best performance was scored at the 
BOTH=6 setting. Therefore, this window size was 
used for the following experiments. These results 
also indicated that entire sentences (BOTH=cc) are 
unsuitable for this task. 


Feature (window size) 


4.2 Machine Learning Method 


We compared various machine-learning methods 
from two points of view: accuracy and time. The 
result, presented in Table 2, shows that SVM with 
a polynomial kernel showed feasibility from both 
viewpoints of accuracy and the training time. 


5 Experiments 


We assessed the detection performance using actu- 
al influenza reports provided by the Japanese IDSC. 


5.1 


We compared the various methods as follows: 


Comparable Methods 
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Figure 3: Window size and Accuracy (F-measure). 
RIGHT shows a method used only the right context. LEFT 
shows a method used only the left context. BOTH represents a 
method using both the right and left context. The number 
shows the window size. œ uses all words in each context di- 
rection. 



































Classifier F- Training 

Measure Time (sec) 

AdaBoost (Freund 1996) 0.592 40.192 

Bagging (Breiman 1996) 0.739 30.310 

Decision Tree (Quinlan1993) 0.698 239.446 

Logistic Regression 0.729 696.704 

Naive Bayes 0. 741 7.383 

Nearest Neighbor 0.695 22.441 

Random Forest (Breiman 2001) 0.729 38.683 

SVM (RBF kernel) 0.738 92.723 
(Cortes and Vapnik 1995) 

SVM (polynomial kernel; d=2) 0.756 13.256 





Table 2: Machine Learning Methods and Perform- 
ance (/’-measure and Training Time) 


© TWEET-SVM: The proposed SVM-based 
method (window size = 6). 

© TWEET-RAW: A simple frequency-based 
method. This approach outputs the relative 
frequency of word “influenza” appearing in 
Twitter. 

@ DRUG: The amounts of drug sales (sales of 
cold medicines). Statistics are provided by 
the Japanese Ministry of Health, Labor and 
Welfare. 

© GOOGLE: Google flu trend detection (Japane- 
se version). This method uses a query log of 
the Google search engine (Ginsberg et al., 
2009)’. 





° http://www.google.org/flutrends/ 





5.2 Gold Standard and Test-Set 


For gold standard data, we used data that are de- 
scribed in Section 2, as reported from IDSC. The 
report is released once a week. Therefore, the 
evaluation is done on a weekly basis. 
We split the data into four seasons as follows: 

Season I: winter 2008, 

Season IT: summer 2009, 

Season III: winter 2009, 

Season IV: summer 2010. 


To investigate further detailed evaluations, we split 
the winters into two sub-seasons: before the peak 
and after the peak. We regard the peak point as 
the day with the highest number in that season. The 
Statistics derived from the data are presented in 
Table 3. 


Excessive News Period: In our experimental data, 
Season II and the earlier peak of Season III are 
special periods because news related to swine flu 
(HIN1 flu) is extremely hot in those seasons (Fig. 
4). This paper calls them Excessive News Periods. 
We also investigated the results with and without 
the excessive news period. 


CIN ).com/nealth 


HOME ASIA EUROPE U.S. WORLD WORLDBUSINESS TECHNOLOGY ENTERTAINMENT WORLD SPC 





Swine flu 'not stoppable,’ World 
Health Organization says 


Increasing the alert to Phase 6 does not mean 
that the disease is deadlier or more dangerous 
r than before, just that it has spread to more 
countries, the WHO said. 


“This is an important and challenging day for all o 
us,” WHO Director General Margaret Chan said in, 
|| a briefing with reporters. "We are moving into the 
early days of the first flu pandemic of the 21st 
century.” 


The last previous pandemic occurred in 1968. 





As of Thursday, the virus had spread to 74 
countries, the health agency said. There were 
28.774 confirmed cases and 144 deaths. 


Figure 4: A CNN news on “swine flu” in June 
2009 (Season II in our experiment). 
Experimental data include such excessive news peri- 
ods. 


5.3 Evaluation Metric 


The evaluation metric is based on correlation 
(Pearson correlation) between the gold standard 
value and the estimated value. 
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5.4 Result 


The results are presented in Table 4. In the non- 
excessive news period, the proposed method 
achieved the highest performance (0.890 correla- 
tion). This correlation is considerably higher than 
the query-based approach (GOOGLE), demonstrat- 
ing the basic feasibility of the proposed approach. 
However, during the excessive news periods, the 
proposed method suffers from an avalanche of 
news, generating a news bias. This phenomenon is 
a remaining problem to be resolved in future stud- 
ies. 


6 Discussion 


6.1 SVM-based Negative Filtering contributes 


to Performance 


In most seasons, the proposed SVM approach 
(TWEET-SVM) shows higher correlation than the 
simple word lookup method (TWEET-RAW). The 
average improvement is 0.196 (max 0.56; min- 
0.009), which significantly boosts the correlation. 
This result demonstrates the basic feasibility of the 
proposed approach. In the future, more advantages 
attributable to the proposed approach can be ob- 
tained if the classification performance improves. 


6.2 All Methods Suffer from News Bias in 
Excessive News Period 


All methods expose the poor performance that pre- 
vails during the excessive news period (from Sea- 
son II to Season III before the peak). Especially, 
tweet-based methods show dramatically reduced 
correlation, which indicates that Twitter is vulner- 
able to newswire bias. 

One reason for that vulnerability is that Twitter 
is a kind of communication tool by which a tweet 
affects other people. Consequently, the possibility 
exists that a few tweets related to “flu” might 
spread widely, generating an explosive burst of 
influenza-related tweets. Future studies must ad- 
dress this burst phenomenon. 





All Season 





79 weeks (0.221) 
































Season | Season Il Season Ill Season IV 
2008/11/9 - 2009/4/5 2009/4/12 - 2009/7/12 - 2010/2/14 2010/2/21 - 
2009/7/5 2010/7/4 
22 weeks (0.423) 26 weeks (0.388) 
Before peak After peak 13 weeks Before peak After peak 18 weeks 
2008/11/9-2009/1/25 | 2009/2/1-2009/4/5 (0.553) 2009/7/12-2009/11/29 2009/12/6-2010/2/14 (0.468) 
12weeks (0.576) 10 weeks 15 weeks 11 weeks 
(0.632) (0.514) (0.602) 
Non-excessive news period Excessive news period Non-excessive news period 











Table 3: Test-set Tracks and the 


number of data points (weeks). 


The number in the bracket indicates the statistical significance level. 















































TWEET-RAW TWEET-SVM DRUG GOOGLE 
(Proposed 
Method) 
Excessive news period 0.001 0.060 0.844 0.918 
Non- excessive news period 0.831 0.890 0.308 0.847 
0.683 0.816 -0.208 0.817 
Before peak 0.914 0.974 -0.155 0.962 
Season | 
After peak 0.952 0.955 0.557 0.959 
Season Il -0.009 -0.018 0.406 0.232 
0.382 0.474 0.684 0.881 
Before peak 0.390 0.474 0.919 0.924 
Season Ill 
After peak 0.960 0.944 0.364 0.936 
Season IV 0.391 0.957 0.130 0.976 





Table 4: Results (Correlation Ratio). 
The number in bold indicates the significance correlation (p=0.05). The number with underline indicates the highest value in each 


season. 


6.3 Tweets have Advantages in Early Stage 
Detection 


From practical viewpoints, the most important task 
is to detect influenza epidemics before the peak 
(early stage detection). Consequently, the correla- 
tion of the two seasons, Season I before the peak 
and Season III before the peak, presents the practi- 
cal performance. Figure 5 portrays detailed results 
of all methods. 

In Season I before the peak (Figure 5 Left), the 
proposed method (TWEET-SVM) shows the best 
performance among all methods. 
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In Season II before the peak (Figure 5 Right), 
all methods including the proposed method showed 
poor correlation because they are included in the 
excessive news periods. During that season, the 
newswires heavily reported the swine flu twice 
(April 2009 and May 2009). Because of this news, 
we can see two peaks in Twitter-based methods 
(TWEET-SVM and TWEET-RAW), which indi- 
cates that Twitter is more sensitive to the news- 
wires. 


gold standard 





tweet-SVM 


=== tweet-RAW 











2008/11 2008/12 2009/01 2009/02 2009/03 





2009/04 


2009/05 


Figure 5: Predicted Values in Season I (Left) and Season II (Right): 


the X-axis shows the date; the Y-axis shows the relative predicted value using each method. 
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Figure 6: Patient Actions (Web Search Query and Tweet) is Sensitive before the Epidemic Peaks. 
Distribution between the gold standard and Detected Values (Search Engine Query (Left) and Tweet (Right)): “+” denotes the 
distribution before the peak; “-” denotes the distribution after the peak. 


6.4 Human Action is Sensitive before Epi- 
demics 


Figure 6 presents the distribution between the de- 
tected values (using GOOGLE and using TWEET- 
SVM) and the gold standard value (before the peak 
is shown by “+”; that after the peak is shown as “- 
”). Although the detected values fundamentally 
correlate with the gold standard, we can see differ- 
ent sensitivity before and after peak (The distribu- 
tion before peak “+” is a higher value than after 
peak “-”.). 

Results show that human action, a web search 
(GOOGLE) and a tweet (TWEET-SVM), highly cor- 
responds to the real influenza before the epidemic 
peaks, and vice versa. More acute detection is pos- 
sible if we incorporate a model considering this 
aspect of human nature. 
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7 Related Works 


The core technology of the proposed method is to 
classify whether the event is positive or negative. 
This task is similar to negation identification, 
which is a traditional topic, especially in medical 
fields. Therefore, we can find many previous stud- 
ies of the topic in the relevant literature. An algo- 
rithm based approach, NegEx (Chapman et al., 
2001), Negfinder (Mutalik et al., 2001), and Con- 
Text (Chapman et al., 2007), a machine learning 
based approach (Elkin et al., 2005; Huang and H.J. 
Lowe, 2007). 




















Previous This study: 
Negation Negative 
Influenza 
(Syntactic) (Semantic) 
| caught a flu. Positive Positive 
sentence Influenza 
| don’t have the flu! Negative Negative 
sentence Influenza 
| have enough flu drugs. Positive Negative 
sentence Influenza 
| have not recovered from Negative Positive 
the flu. sentence Influenza 





Table 5: Our target influenza negation (semantic) 
and previous negation (syntactic) 


Although these approaches specifically examine 
the syntactic negation, this study detects the nega- 
tive influenza, which is a specified semantic nega- 
tion. Table 5 presents the difference between both 
negations. In general, the semantic operation is 
difficult in general. However, this paper revealed 
that the domain (influenza domain) specific seman- 
tic operation provides reasonable results. 

Another aspect of this study is the target mate- 
rial, Twitter data, which have drawn much atten- 
tion. Twitter can provide suitable material for 
many applications such as named entity recogni- 
tion (NER) (Finin et al., 2010) and sentiment 
analysis (Barbosa and Feng, 2010). Although these 
studies specifically examine the fundamental NLP 
techniques, this study directly targets an NLP ap- 
plication that can contribute to our daily life. 


8 Conclusion 


This paper proposed a new Twitter-based influenza 
epidemics detection method, which relies on the 
Natural Language Processing (NLP). Our proposed 
method could successfully filter out the negative 
influenza tweets (f-measure=0.76), which are post- 
ed by the ones who did not actually catch the influ- 
enza. The experiments with the test data 
empirically demonstrate that the proposed method 
detects influenza epidemics with high correlation 
(correlation ratio=0.89), which outperforms the 
state-of-the-art Google method. This result shows 
that Twitter texts precisely reflect the real world, 
and that the NLP technique can extract the useful 
information from Twitter streams. 
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Figure 7: An influenza severance system “INFLU 


kun” using the proposed method is available at 
http://mednlp.jp/influ/. 
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Figure 8: The Timeline of Influenza Epidemics in 
Fukushima. While the Infection Disease Surveil- 
lance Center (IDSC) sometimes stops (gold stan- 
dard) due to the Great East Japan Earthquake, the 
proposed system could continue to work (Our Sys- 
tem). 


Available Resources 

Corpus: The corpus of this study is provided at the 
http://mednIp.jp/~aramaki/KAZEMIRU/. 

Web System: The web service is also released at 
http://mednlp.jp/influ/ (Figure 7 and Figure 8). 
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